perf(decimal): SIMD kernels for d64/d128/d256 and SUM(decimal) reduction by aunjgr · Pull Request #24257 · matrixorigin/matrixone

aunjgr · 2026-04-29T16:01:20Z

What type of PR is this?

Which issue(s) this PR fixes:

What this PR does / why we need it:

Add a new pkg/common/simdkernels package providing SIMD-accelerated kernels for decimal arithmetic, gated by goexperiment.simd (AVX2 required, AVX-512 used opportunistically when available):

d64: add/sub (vector & broadcast, checked & unchecked), compare, multiply helper, scale (×10^k)
d128: add/sub (vector & broadcast, checked & unchecked), neg/abs, sign-extension from d64 (amd64 asm with prefetch; pure-Go fallbacks for arm64 and other archs)
d256: add/sub, neg/abs

Wire the kernels into hot paths:

func_cast.go: d64 → d128 cast uses Decimal64SignExtend
arith_decimal_fast.go: d128 add/sub broadcast paths use the SIMD vector/scalar+vector/vector+scalar kernels
aggexec/sum_decimal_fast.go: SUM(decimal64) and SUM(decimal128) reduction uses the SIMD sum-reduce kernel for runs ≥ 32 elements

Makefile now passes GOEXPERIMENT=simd to go build so the kernels are enabled by default.

TPC-H SF100 wall-time wins (Zen 3, 24-core, median of 5):

Query	Baseline	This change	Δ
Q1	12.78s	11.98s	-6.3%
Q5	4.13s	3.43s	-16.9%
Q9	12.51s	10.70s	-14.4%
Q14	2.67s	2.26s	-15.4%

qodo-code-review · 2026-04-29T16:01:24Z

ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one.

qodo-code-review · 2026-04-30T03:53:56Z

ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one.

Add a new `pkg/common/simdkernels` package providing SIMD-accelerated kernels for decimal arithmetic, gated by `goexperiment.simd` (AVX2 required, AVX-512 used opportunistically when available): - d64: add/sub (vector & broadcast, checked & unchecked), compare, multiply helper, scale (×10^k) - d128: add/sub (vector & broadcast, checked & unchecked), neg/abs, sign-extension from d64 (amd64 asm with prefetch; pure-Go fallbacks for arm64 and other archs) - d256: add/sub, neg/abs Wire the kernels into hot paths: - `func_cast.go`: d64 → d128 cast uses `Decimal64SignExtend` - `arith_decimal_fast.go`: d128 add/sub broadcast paths use the SIMD vector/scalar+vector/vector+scalar kernels - `aggexec/sum_decimal_fast.go`: SUM(decimal64) and SUM(decimal128) reduction uses the SIMD sum-reduce kernel for runs ≥ 32 elements `Makefile` now passes `GOEXPERIMENT=simd` to `go build` so the kernels are enabled by default. TPC-H SF100 wall-time wins (Zen 3, 24-core, median of 5): | Query | Baseline | This change | Δ | |-------|----------|-------------|--------| | Q1 | 12.78s | 11.98s | -6.3% | | Q5 | 4.13s | 3.43s | -16.9% | | Q9 | 12.51s | 10.70s | -14.4% | | Q14 | 2.67s | 2.26s | -15.4% | Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Int64x8.AndNot has inverted operand semantics compared to Int64x4.AndNot on Go 1.26.2 (VPANDNQ computes ~receiver & arg rather than receiver & ~arg). This caused all AVX-512 checked-add overflow detection to silently miss overflows, returning -1 instead of the overflow index. Fix: swap operands in all 6 AVX-512 AddChecked functions (d64, d128, d256 vector and scalar-broadcast variants). Also replace custom itoa() with strconv.Itoa to fix build without GOEXPERIMENT=simd (d64_compare_test.go referenced itoa from a build-tagged file). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

# Conflicts: # pkg/sql/colexec/aggexec/sum_decimal_fast.go

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

aunjgr requested review from XuPeng-SH and ouyuanning as code owners April 29, 2026 16:01

aunjgr temporarily deployed to ci April 29, 2026 16:01 — with GitHub Actions Inactive

aunjgr had a problem deploying to ci April 29, 2026 16:01 — with GitHub Actions Failure

aunjgr temporarily deployed to ci April 29, 2026 16:01 — with GitHub Actions Inactive

aunjgr had a problem deploying to ci April 29, 2026 16:01 — with GitHub Actions Failure

aunjgr temporarily deployed to ci April 29, 2026 16:01 — with GitHub Actions Inactive

aunjgr had a problem deploying to ci April 29, 2026 16:01 — with GitHub Actions Failure

matrix-meow added the size/XXL Denotes a PR that changes 2000+ lines label Apr 29, 2026

mergify Bot added the kind/enhancement label Apr 29, 2026

aunjgr marked this pull request as draft April 29, 2026 16:06

aunjgr marked this pull request as ready for review April 30, 2026 03:53

aunjgr had a problem deploying to ci April 30, 2026 03:54 — with GitHub Actions Failure

aunjgr requested a review from Copilot April 30, 2026 03:55

Copilot started reviewing on behalf of aunjgr April 30, 2026 03:55 View session

aunjgr marked this pull request as draft April 30, 2026 03:56

aunjgr force-pushed the decimal-perf branch from 7f0210c to 6f5a2bb Compare April 30, 2026 07:09

aunjgr had a problem deploying to ci April 30, 2026 07:10 — with GitHub Actions Failure

aunjgr had a problem deploying to ci April 30, 2026 07:10 — with GitHub Actions Error

aunjgr had a problem deploying to ci April 30, 2026 07:10 — with GitHub Actions Failure

aunjgr temporarily deployed to ci April 30, 2026 07:10 — with GitHub Actions Inactive

aunjgr had a problem deploying to ci April 30, 2026 07:10 — with GitHub Actions Failure

aunjgr temporarily deployed to ci April 30, 2026 07:11 — with GitHub Actions Inactive

aunjgr had a problem deploying to ci April 30, 2026 07:11 — with GitHub Actions Failure

aunjgr temporarily deployed to ci April 30, 2026 07:11 — with GitHub Actions Inactive

aunjgr had a problem deploying to ci May 4, 2026 16:54 — with GitHub Actions Failure

aunjgr temporarily deployed to ci May 4, 2026 16:54 — with GitHub Actions Inactive

aunjgr had a problem deploying to ci May 4, 2026 16:54 — with GitHub Actions Error

aunjgr temporarily deployed to ci May 4, 2026 16:54 — with GitHub Actions Inactive

aunjgr had a problem deploying to ci May 4, 2026 16:54 — with GitHub Actions Failure

aunjgr temporarily deployed to ci May 4, 2026 16:54 — with GitHub Actions Inactive

aunjgr and others added 3 commits May 5, 2026 01:01

Merge remote-tracking branch 'upstream/main' into decimal-perf

4eb16d9

# Conflicts: # pkg/sql/colexec/aggexec/sum_decimal_fast.go

chore: remove doc.go, add copyright header to asm file

955c3f1

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chore: untrack design docs accidentally added by merge

6dd1691

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

aunjgr had a problem deploying to ci May 4, 2026 17:35 — with GitHub Actions Failure

aunjgr temporarily deployed to ci May 4, 2026 17:35 — with GitHub Actions Inactive

aunjgr had a problem deploying to ci May 4, 2026 17:35 — with GitHub Actions Failure

aunjgr temporarily deployed to ci May 4, 2026 17:35 — with GitHub Actions Inactive

aunjgr had a problem deploying to ci May 4, 2026 17:35 — with GitHub Actions Failure

aunjgr temporarily deployed to ci May 4, 2026 17:36 — with GitHub Actions Inactive

aunjgr had a problem deploying to ci May 4, 2026 17:36 — with GitHub Actions Failure

aunjgr temporarily deployed to ci May 4, 2026 17:36 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(decimal): SIMD kernels for d64/d128/d256 and SUM(decimal) reduction#24257

perf(decimal): SIMD kernels for d64/d128/d256 and SUM(decimal) reduction#24257
aunjgr wants to merge 5 commits intomatrixorigin:mainfrom
aunjgr:decimal-perf

aunjgr commented Apr 29, 2026

Uh oh!

qodo-code-review Bot commented Apr 29, 2026

Uh oh!

qodo-code-review Bot commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

aunjgr commented Apr 29, 2026

What type of PR is this?

Which issue(s) this PR fixes:

What this PR does / why we need it:

Uh oh!

qodo-code-review Bot commented Apr 29, 2026

Uh oh!

qodo-code-review Bot commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants